Skip to content

Add --link-targets-dir argument to linkchecker #143883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

pietroalbini
Copy link
Member

@pietroalbini pietroalbini commented Jul 13, 2025

In my release notes API list tool (#143053) I want to check whether all links generated by the tool are actually valid, and using linkchecker seems to be the most sensible choice.

Linkchecker currently has a fairly big limitation though: it can only check a single directory, it checks all of the files within it, and link targets must point inside that same directory. This works great when checking the whole documentation package, but in my case I only need to check that one file contains valid links to the standard library docs.

To solve that, this PR adds a new --link-targets-dir flag to linkchecker. Directories passed to it will be valid link targets (with lower priority than the root being checked), but links within them will not be checked.

I'm not that happy with the name of the flag, happy for it to be bikeshedded.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels Jul 13, 2025
@pietroalbini pietroalbini marked this pull request as ready for review July 13, 2025 11:00
@rustbot
Copy link
Collaborator

rustbot commented Jul 13, 2025

r? @ehuss

rustbot has assigned @ehuss.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 13, 2025
@rustbot
Copy link
Collaborator

rustbot commented Jul 13, 2025

These commits modify the Cargo.lock file. Unintentional changes to Cargo.lock can be introduced when switching branches and rebasing PRs.

If this was unintentional then you should revert the changes before this PR is merged.
Otherwise, you can ignore this comment.

@rust-log-analyzer

This comment has been minimized.

@pietroalbini pietroalbini force-pushed the pa-linkchecker-extra-target branch 3 times, most recently from d51bfa1 to 6831c80 Compare July 13, 2025 11:16
Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand a little more about how your tool works? Is it generating some intermediate files with relative paths, and then translating them to absolute paths? What do the directory structures look like?

@pietroalbini
Copy link
Member Author

Can you help me understand a little more about how your tool works? Is it generating some intermediate files with relative paths, and then translating them to absolute paths? What do the directory structures look like?

The relnotes-api-list tool generates a JSON file with all stabilized APIs in the standard library, and their documentation URLs. These URLs look like std/option/enum.Option.html#method.unwrap for Option::unwrap.

I want to add a step to the tool verifying all those links are correct. To do so, my current implementation generates a temporary HTML file with an <a> tag for each link. I then generate the standard library docs, and I want to ensure all the links in the temporary file are valid.

In practice, this would be like placing my temporary file in build/host/doc, and running linkchecker on that directory. There are two downsides I found when doing that:

  • I need to be either extra careful in ensuring the temporary file is removed from build/host/doc, or I need to copy all of build/host/doc in a temporary directory and run linkchecker on that (which I guess won't be the fastest thing on Windows).
  • Pointing linkchecker to build/host/doc will also check the links in the standard library docs, which requires also generating books, and the pages the books point to.

That's why I decided to add the --link-targets-dir flag: doing --link-targets-dir build/host/doc avoids the need to copy it around, and prevents linkchecker from checking its links (while still allowing other links to link to it).

@ehuss
Copy link
Contributor

ehuss commented Jul 16, 2025

Ah, that makes sense, thanks for the explanation!

@pietroalbini pietroalbini force-pushed the pa-linkchecker-extra-target branch from 6831c80 to 2b12b92 Compare July 17, 2025 08:51
@pietroalbini pietroalbini force-pushed the pa-linkchecker-extra-target branch from 2b12b92 to 2995467 Compare July 17, 2025 08:52
Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=ehuss with the windows issue fixed.

Comment on lines -430 to +489
let entry =
self.cache.entry(pretty_path.clone()).or_insert_with(|| match fs::metadata(file) {
for base in once(&self.root).chain(self.link_targets_dirs.iter()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this from a closure to a loop breaks on windows, because the return value is no longer the correct type. I think something roughly like this should fix it:

--- a/src/tools/linkchecker/main.rs
+++ b/src/tools/linkchecker/main.rs
@@ -439,15 +439,18 @@ fn load_file(&mut self, file: &Path, report: &mut Report) -> (String, &FileEntry
                     }
                 }
                 Err(e) if e.kind() == ErrorKind::NotFound => FileEntry::Missing,
-                Err(e) => {
-                    // If a broken intra-doc link contains `::`, on windows, it will cause `ERROR_INVALID_NAME` rather than `NotFound`.
-                    // Explicitly check for that so that the broken link can be allowed in `LINKCHECK_EXCEPTIONS`.
-                    #[cfg(windows)]
+                // If a broken intra-doc link contains `::`, on windows, it
+                // will cause `ERROR_INVALID_NAME` rather than `NotFound`.
+                // Explicitly check for that so that the broken link can be
+                // allowed in `LINKCHECK_EXCEPTIONS`.
+                #[cfg(windows)]
+                Err(e)
                     if e.raw_os_error() == Some(ERROR_INVALID_NAME)
-                        && file.as_os_str().to_str().map_or(false, |s| s.contains("::"))
-                    {
-                        return FileEntry::Missing;
-                    }
+                        && file.as_os_str().to_str().map_or(false, |s| s.contains("::")) =>
+                {
+                    FileEntry::Missing
+                }
+                Err(e) => {
                     panic!("unexpected read error for {}: {}", file.display(), e);
                 }
             });

report.report();
if report.errors != 0 {
println!("found some broken links");
std::process::exit(1);
}
}

fn parse_cli() -> Result<Cli, String> {
fn to_canonical_path(arg: &str) -> Result<PathBuf, String> {
PathBuf::from(arg).canonicalize().map_err(|e| format!("could not canonicalize {arg}: {e}"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I'm always slightly uneasy with ever using canonicalize. I don't think there is anything to change here since it seems to be working. I just wanted to note the risk here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants